-
-
Notifications
You must be signed in to change notification settings - Fork 147
fix(series): arithmetics for Series[Any] #1343
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(series): arithmetics for Series[Any] #1343
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only issues I see are with respect to the code inside of if TYPE_CHECKING_INVALID_USAGE
that you added.
The concern here is that some of the lines will execute fine if the Series
comes from a DataFrame
and has the correct type inside, but we see that it is a static Series[Any]
. And vice versa.
I think I prefer where if you are doing an operation like subtraction where it will sometimes work and sometimes not work, and the inferred type of one of the operands is Series[Any]
, then we detect that as a typing problem. But we need to be selective.
For example,
df = pd.DataFrame({"a": [1,2,3], "b": pd.to_datetime(["1/1/2025", "2/1/2025", "3/1/2025"])})
sa = df["a"]
sb = df["b"]
sa - pd.Timestamp("1/1/2024") # fails at runtime
sb - pd.Timestamp("1/1/2024") # works at runtime
Here sa
and sb
are Series[Any]
(mypy) or Series[Unknown]
(pyright). So the typing either has to accept both cases or reject both cases.
I think we have to be selective here, and probably disallow subtraction with untyped Series
when the other argument is known to be time related (Timestamp
, Timedelta
and associated Series
) or is a string or Series[str]
. I think the current stubs are more permissive, but now I'm not sure that's the right thing to do.
Hi @Dr-Irv , thank you for drafting the plan. Current plan
I would like to summarise this typing plan as following:
|
The challenge here is the issue of wide vs narrow types. See https://github.com/pandas-dev/pandas-stubs/blob/main/docs/philosophy.md#narrow-vs-wide-arguments for some writeup I did about that. Let's consider this example from your list:
In what is in si = pd.DataFrame({"a": pd.Series([1,2,3])})["a"]
st = pd.Series(pd.date_range("1/1/2005", "1/3/2005"))
result = si - st I think we do a better service to users if we actually catch this via typing, i.e., for This makes the user then Let's also consider this example: st = pd.Series(pd.date_range("1/1/2005", "1/3/2005"))
sd = pd.DataFrame({"a": [pd.Timedelta("1 day"), pd.Timedelta("2 days"), pd.Timedelta("3 days")]})["a"]
result = st - sd In this case, if we adopt my proposal, the type checker would say that result = st - cast("pd.Series[Timedelta]", sd) which is telling the type checker "I know this is a series of timedeltas" I'm choosing what I consider to be a happy medium here between your proposal, and something that would be too narrow (e.g., disallowing I should say that the current behavior in the stubs is from 3 years ago when we first inherited the project from something MIcrosoft had started, and now that I have more experience with typing, as well as using the stubs in my own code, I've come around to trying to find more things with static type checking if we can find them, then not. So the summary of my proposal is (with respect to
Let me know your thoughts on that. |
Hi @Dr-Irv , thank you again for the detailed reply.
|
Or we can catch it in type checking??
Yes, this latter point is quite valid. Here's a possible modification of my proposal: (with respect to
Is this possible? |
I tried to implement the plan in e0b5b59. The |
So should I review now? |
Yes please. I believe I have implemented the ideas in #1343 (comment). I would like to follow the proposal there. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I figured out how to handle the operators with str
. See 9c972d3
I put the tests I used in one file, figuring you can spread them out accordingly.
The issue is that _ListLike
includes Sequence[S1]
, which matches a str
, so the key was to use SequenceNotStr[S1]
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty close. Just issue in using pytest.raises
.
I also think it would be worth adding a subsection to https://github.com/pandas-dev/pandas-stubs/blob/main/docs/philosophy.md#use-of-generic-types that explains the philosophy we've agreed to. Namely, since df["a"]
has unknown type, we restrict arithmetic when the the other operand is Timedelta
, Timestamp
, str
, etc., and their Series
variants. Then we tell people they should use cast
to tell the type checker "I know that this column of my DataFrame
has the right type"
There's been another issue with the overloads that has caused the dev version of mypy to fail on the current main. I discovered the problem, and if you look at this set of diffs, it fixes the problem, working with both mypy 1.17.1 and the dev version. Can you incorporate these changes into your PR? 9c972d3...Dr-Irv:pandas-stubs:d3b8f164b2dc469146d43d49fd73eabc7ebe643a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still would like to not use the pytest.raises
pattern.
I like the docs. Thanks for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @cmp0xff . Very nice contribution.
Question for you - I could do a release with these latest changes, because we have all the arithmetic working, and only have the work to get rid of TimestampSeries
and TimedeltaSeries
left to do.
So do you think I should release this now, or wait?
Hi @Dr-Irv , thank you for asking me, I am honoured. If I were in a the position to release a new version, I would do so now and save dropping |
It's done. Just released 2.3.2.250827 |
This PR implements the ideas from #1274 (comment) and #1274 (comment).
assert_type()
to assert the type of any return value